Generating Random Data

This is the random generator used by DTF and it can be easily reproduced in other languages so that the random generation of bytes can be similar and data validation can be easily done without having to worry about the data generated in another language not matching the same pseudo random method that is within DTF.

NOTE: This implementation is NOT thread-safe. So make a new instance of DTFRandom for each of your threads.

The following is based on information taken from here and re-implemented within DTF. Based on a solution that was found in 2003 by George Marsaglia in the form of XORShift Random Number Generators. Marsaglia showed that a generator involving three shifts and three XOR operations generates a "full" period and that the resulting values satisfy various statistical tests of randomness that even the LCGs fail (LCG algorithm is currently used by Java).

The "magic" values of 21, 35 and 4 have been found to produce good results. With these values, the generator has a full period of 2^64-1, and the resulting values pass Marsaglia's "Diehard battery" of statistical tests for randomness.

Here is the way that random data is currently generated within DTF:


Java

Sample code of how we generate the pseudo random data within DTF, this file can be found at src/java/com/dtf/yahoo/util/DTFRandom.java.

 public void nextBytes(byte[] bytes) {
     for (int i = 0, len = bytes.length; i < len; ) {
         for (int rnd = nextInt(),
              n = Math.min(len - i, Integer.SIZE/Byte.SIZE);
              n-- > 0; rnd >>= Byte.SIZE) {
             // not allowing $ because then a property could be accidentally
             // created, and this would lead to unnecessary issues.
             byte b = (byte)rnd;
             if ( b == '$' ) continue;
             bytes[i++] = b;
         }
     }
 }
 
 public int nextInt() {
     return next(Integer.SIZE);
 }
 
 public boolean nextBoolean() {
     return next(1) != 0;
 }
   
 protected int next(int nbits) {
     long x = this.seed;
     x ^= (x << 21);
     // unsigned shift so we don't drag the signal
     x ^= (x >>> 35);
     x ^= (x << 4);
     x &= ((1L << nbits) - 1);
     return (int) x;
 }
 

The following is an example of how to mimic the same random byte generator that is currently implemented in DTF in C. This same method can be used in C++. The source file containing this sample can be found in DTF at src/native/random.c and should have a few comments on how to build it and run the sample test that exists.

C/C++
 
 const int INTEGER_SIZE = 32;
 const int BYTE_SIZE = 4;
 
 #define LONG64 long long
 
 static long rseed = 1234567890L;
 long min(long a, long b) {
     return (a < b ? a : b);
 }
 
 void nextBytes(char* bytes, int length) {
    int i,rnd,len,n;

    for (i = 0, len = length; i < len; ) {
         for (rnd = nextInt(),
              n = min(len - i, INTEGER_SIZE/BYTE_SIZE);
              n-- > 0; rnd >>= BYTE_SIZE) {
            // not allowing $ because then a property could be accidentally
            // created, and this would lead to unnecessary issues.
            char b = (char)rnd;
            if ( b == '$' ) continue;
            bytes[i++] = b;
         }
     }
 }
 
 int nextInt() {
     return next(INTEGER_SIZE);
 }
 
 unsigned int nextBoolean() {
     return next(1) != 0;
 }
  
 int next(int nbits) {
     LONG64 x = rseed++;
     x ^= (x << 21);
     x ^= (x >> 35);
     x ^= (x << 4);
     return x;
 }
  
 void random(long seed) {
     rseed = seed;
 }